Search CORE

34 research outputs found

GPUs as Storage System Accelerators

Author: Al-Kiswany Samer
Gharaibeh Abdullah
Ripeanu Matei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/05/2012
Field of study

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

arXiv.org e-Print Archive

Crossref

Active Data: A Data-Centric Approach to Data Life-Cycle Management

Author: Al-Kiswany Samer
Fedak Gilles
Ripeanu Matei
Simonet Anthony
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/11/2013
Field of study

International audienceData-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. In this paper, we argue that data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task centric approach. We propose Active Data, a fundamentally novel paradigm for data life cycle management. Active Data follows two principles: data-centric and event-driven. We report on the Active Data programming model and its preliminary implementation, and discuss the benefits and limitations of the approach on recognized challenging data-intensive science use-cases.Les importants volumes de données produits par la science présentent de nouvelles opportunités d'innovation et de découvertes. Cependant ceci sera conditionné par notre capacité à gérer efficacement de très grands jeux de données. La gestion de données pour les applications scientifiques data-intensive présente un véritable défi~; elle requière le support de cycles de vie très complexes, la coordination de plusieurs sites, de la tolérance aux pannes et de passer à l'échelle sur des dizaines de sites avec plusieurs péta-octets de données. Dans cet article nous argumentons que la gestion des données pour les applications scientifiques data-intensive nécessite une approche fondamentalement différente de l'actuel paradigme centré sur les tâches. Nous proposons Active Data, un nouveau paradigme pour la gestion du cycle de vie des données. Active Data suit deux principes~: il est centré sur les données et à base d'événements. Nous présentons le modèle de programmation Active Data, un prototype d'implémentation et discutons des avantages et limites de notre approche à partir d'étude de cas d'applications scientifiques

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Interacting with Large Distributed Datasets using Sketch

Author: Al-Kiswany Samer
Andoni Alexandr
Barham Paul
Boshmaf Yazan
Budiu Mihai
Isaacs Rebecca
Luo Qingzhou
Murray Derek
Plotkin Gordon
Publication venue
Publication date: 29/01/2015
Field of study

We present Sketch, a distributed software infrastructure for building interactive tools for exploring large datasets, distributed across multiple machines. We have built three sophisticated applications using this framework: a billion-row spreadsheet, a distributed log browser, and a distributed- systems performance debugging tool. Sketch applications allow interactive and responsive exploration of complex distributed datasets, scaling gracefully to large system sizes. The conflicting constraints of large-scale data and small timescales required by human interaction are difficult to satisfy simultaneously. Sketch exploits a sweet spot in this trade-off by exploiting the observation that the precision of a data view is limited by the resolution of the user?s screen. The system pushes data reduction operations to the data sources. The core Sketch abstraction provides a narrow programming interface; Sketch clients construct a distributed application by stacking modular components with identical interfaces, each providing a useful feature: network transparency, concurrency, fault-tolerance, straggler avoidance, round-trip reduction, distributed aggregation

Minds@University of Wisconsin

Edinburgh Research Explorer

Toward a definition for serverless computing

Author: Abad Cristina
Al-Kiswany Samer
Balis Bartosz
Bauer Andre
Bondi Andre
Chard Kyle
Chard Ryan
Chatley Robert
Chien Andrew
Davis Jesse
Donkervliet Jesse
Eismann Simon
Elmroth Erik
Ferrier Nicola
Foster Ian
Hassan Ahmed Ali-Eldin
Herbst Nikolas
Iosup Alexandru
Jacobsen Hans-Arno
Jamshidi Pooyan
Kounev Samuel
Kousiouris Georgios
Leitner Philipp
Lopes Pedro
Maggio Martina
Malawski Maciej
Metzler Bernard
Muthusamy Vinod
Papadopoulos Alessandro
Patros Panos
Pierre Guillaume
Rana Omer
Ricci Robert
Scheuner Joel
Sedaghat Mina
Shahrad Mohammad
Shenoy Prashant
Spillner Josef
Taibi Davide
Thain Douglas
Trivedi Animesh
Uta Alexandru
van Beek Vincent
van Eyk Erwin
van Hoorn Andre
Vasani Soam
Wamser Florian
Wirtz Guido
Yussupov Vladimir
Publication venue: Schloss Dagstuhl -- Leibniz-Zentrum fur Informatik
Publication date
Field of study

Online Research @ Cardiff

Embracing diversity : optimizing distributed storage systems for diverse deployment environments

Author: Al-Kiswany Samer
Publication venue: University of British Columbia Press
Publication date: 01/11/2013
Field of study

Distributed storage system middleware acts as a bridge between the upper layer applications, and the lower layer storage resources available in the deployment platform. Storage systems are expected to efficiently support the applications’ workloads while reducing the cost of the storage platform. In this context, two factors increase the complexity of the design of storage systems: First, the applications’ workloads are diverse among number of axes: read/write access patterns, data compressibility, and security requirements to mention only a few. Second, storage system should provide high performance within a certain dollar budget. This dissertation addresses two interrelated issues in this design space. First, can the computational power of the commodity massively multicore devices be exploited to accelerate storage system operations without increasing the platform cost? Second, is it possible to build a storage system that can support a diverse set of applications yet can be optimized for each one of them? This work provides evidence that, for some system designs and workloads, significant performance gains are brought by exploiting massively multicore devices and by optimizing the storage system for a specific application. Further, my work demonstrates that these gains are possible while still supporting the POSIX API and without requiring changes to the application. Finally, while these two issues can be addressed independently, a system that includes solutions to both of them enables significant synergies.Applied Science, Faculty ofElectrical and Computer Engineering, Department ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

Beyond music sharing: an evaluation of peer-to-peer data dissemination techniques in large scientific collaborations

Author: Al Kiswany Samer
Publication venue: University of British Columbia Press
Publication date: 01/05/2008
Field of study

The avalanche of data from scientific instruments and the ensuing interest from geographically distributed users to analyze and interpret it accentuates the need for efficient data dissemination. An optimal data distribution scheme will find the delicate balance between conflicting requirements of minimizing transfer times, minimizing the impact on the network, and uniformly distributing load among participants. We identify several data distribution techniques, some successfully employed by today's peer-to-peer networks: staging, data partitioning, orthogonal bandwidth exploitation, and combinations of the above. We use simulations to explore the performance of these techniques in contexts similar to those used by today's data-centric scientific collaborations and derive several recommendations for efficient data dissemination. Our experimental results show that the peer-to-peer solutions that offer load balancing and good fault tolerance properties and have embedded participation incentives lead to unjustified costs in today's scientific data collaborations deployed on over-provisioned network cores. However, as user communities grow and these deployments scale, peer-to-peer data delivery mechanisms will likely outperform other techniques.Applied Science, Faculty ofElectrical and Computer Engineering, Department ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

Configurable Security for Scavenged Storage Systems

Author: Abdullah Gharaibeh
Matei Ripeanu
Samer Al-kiswany
Publication venue
Publication date: 01/01/2008
Field of study

Scavenged storage systems harness unused disk space from individual workstations the same way idle CPU cycles are harnessed by desktop grid applications lik

CiteSeerX

Crossref